Skip to content

fix(dflash): derive n_target_layers fallback in gguf_draft_loader#138

Merged
davide221 merged 2 commits into
Luce-Org:mainfrom
javierpazo:xabicasa/dflash-gguf-draft-loader-target-layers
May 12, 2026
Merged

fix(dflash): derive n_target_layers fallback in gguf_draft_loader#138
davide221 merged 2 commits into
Luce-Org:mainfrom
javierpazo:xabicasa/dflash-gguf-draft-loader-target-layers

Conversation

@javierpazo
Copy link
Copy Markdown
Contributor

fix(dflash): derive n_target_layers fallback in gguf_draft_loader

Follow-up to merged #79 ("read model params from GGUF at runtime,
support any qwen35 size"). #79 covers the target loader and the
common drafter fields, but the fallback chain in gguf_draft_loader
still requires the legacy dflash.n_target_layers key to be
present.

Drafters published with the new metadata key naming
(dflash-draft.dflash.target_layer_ids plus
n_target_features) hit the path where the legacy key is missing
and the loader fails. Concrete case: the published Q8 GGUF drafter
for Qwen3.6-27B-DFlash.

This change derives n_target_layers in two steps:

  1. If target_layer_ids is present, use its length.
  2. Otherwise, if n_target_features and n_embd are both
    present, use n_target_features / n_embd (with a sanity
    check that the division is exact).

If neither is available, the loader still fails with the same
honest error as before. The legacy key path is untouched.

Validation (RTX 6000 Ada sm_89, Qwen3.6-27B Heretic Q4_K_M target,
Q8 GGUF drafter via the new metadata):

Loaded SWA layers: 4/5, decode 21.06 tok/s, no fallback chain
errors during init.

Verification vs existing community PRs:

COMP-COMPL with #79 (merged 2026-05-03). #79 covered target
loader and drafter fields generically. This PR is a small
follow-up for the case where only the new metadata is present
on the drafter side.

Author: Javier Pazo xabicasa@gmail.com

Follow-up to merged Luce-Org#79 ("read model params from GGUF at runtime,
support any qwen35 size"). Luce-Org#79 covers the target loader and the
common drafter fields, but the fallback chain in gguf_draft_loader
still requires the legacy `dflash.n_target_layers` key to be
present.

Drafters published with the new metadata key naming
(`dflash-draft.dflash.target_layer_ids` plus
`n_target_features`) hit the path where the legacy key is missing
and the loader fails. Concrete case: the published Q8 GGUF drafter
for Qwen3.6-27B-DFlash.

This change derives `n_target_layers` in two steps:

  1. If `target_layer_ids` is present, use its length.
  2. Otherwise, if `n_target_features` and `n_embd` are both
     present, use `n_target_features / n_embd` (with a sanity
     check that the division is exact).

If neither is available, the loader still fails with the same
honest error as before. The legacy key path is untouched.

Validation (RTX 6000 Ada sm_89, Qwen3.6-27B Heretic Q4_K_M target,
Q8 GGUF drafter via the new metadata):

  Loaded `SWA layers: 4/5`, decode 21.06 tok/s, no fallback chain
  errors during init.

Verification vs existing community PRs:

  COMP-COMPL with Luce-Org#79 (merged 2026-05-03). Luce-Org#79 covered target
  loader and drafter fields generically. This PR is a small
  follow-up for the case where only the new metadata is present
  on the drafter side.

Author: Javier Pazo <xabicasa@gmail.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@davide221 davide221 merged commit 94f15b4 into Luce-Org:main May 12, 2026
1 check passed
dusterbloom added a commit to dusterbloom/lucebox-hub that referenced this pull request May 12, 2026
…e-Org#119/Luce-Org#149 reorg

Brings in HIP/Strix Halo backend (PRs Luce-Org#119, Luce-Org#149), dflash source-layout
reorg (Luce-Org#138 — qwen35/, draft/, qwen3/ subdirs), GGUF draft loader fixes,
daemon ubatch defaults, prefix cache + streaming tool-call fixes.

Conflicts resolved:
 - dflash/CMakeLists.txt: take main's reorganized source paths; keep
   our gemma4_*.cpp entries; preserve the DFLASH27B_MIN_SM backwards-
   compat shim so gemma4_dflash_graph.cpp:621 keeps building under
   main's renamed _dflash27b_cuda_min_sm variable.
 - dflash/deps/llama.cpp: keep our submodule pointer (eb3676f40 on
   feature/tq3-kv-cache-clean). Main's c79573c9b lacks the TQ3
   dispatcher fixes required for Gemma4 KV correctness; if useful
   upstream commits land there, they should be cherry-picked into
   our submodule branch separately.

Verified: TQ3 64K MTP gamma=2 pflash post-merge:
  decode 10.58 tok/s, prefill 463 tok/s, accept 0.78 — matches
  pre-merge baseline (10.25 / 445 / 0.78) within noise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants